Progress Memo 2

Final Project
Data Science 1 with R (STAT 301-1)

Author

Nikole Montero Cervantes

Published

November 29, 2023

Introduction

Data Overview

This dataset was acquired through scraping TripAdvisor (TA), a well-known travel website, for its restaurant information. The dataset contains a pool of 1,083,397 restaurants across European countries. There are 42 variables, among these variables, 25 are categorical and 17 are numericals. The raw datasets for Europe’s largest cities were then carefully selected and combined for further examination.

It is important to note that this dataset comprises only those restaurants registered in the TripAdvisor database. Thus, it might not encompass all the restaurants within a city because the dataset relies solely on the TripAdvisor database.

Cleaning the data

In the process of cleaning the data, various essential string manipulation, functions and transformation techniques were employed using the dplyr and stringr packages in R. The dataset underwent a series of refinements to enhance its tidiness and facilitate downstream analyses. Key steps in the cleaning process include:

• Variable Renaming

• Creating and Modifying Variables

• Handling Categorical Data

• Text Processing

• List Manipulation

• Numeric Extraction

• Data Filtering and Handling Missing Values

Starting of EDA

Univariate Analysis

In order to find patterns or unusual trends, I started analyzing at each variable in the dataset.

For Restaurants

Considering there are too many observations, to facilitate my exploration process I decided to look at the 10 most common restaurants.

Through this bar plot, it is possible to see that there are a number of restaurants for the same restaurant name. Thus, looking back at the data I realized that even though those restaurants have the same name, there are all in different cities. Taking as an example, Flunch:

restaurant_name city
Flunch Franconville
Flunch Mers-les-Bains
Flunch Villebon-sur-Yvette
Flunch Le Quesnoy
Flunch Strasbourg
Flunch Bordeaux
Flunch Pau
Flunch Clermont-Ferrand
Flunch Tours
Flunch Besancon
Flunch Nantes
Flunch Poitiers
Flunch Avignon
Flunch Antibes
Flunch Roanne City
Flunch Epagny
Flunch Moulins
Flunch Macon
Flunch Montbeliard
Flunch Thionville
Flunch Boulogne-sur-Mer
Flunch Cholet
Flunch Amiens
Flunch Manosque
Flunch Vitrolles
Flunch Le Pontet
Flunch Saint-Jean-de-la-Ruelle
Flunch Herouville-Saint-Clair
Flunch Pertuis
Flunch Noyelles-Godault
Flunch Bonneuil-sur-Marne
Flunch Charleville-Mezieres
Flunch Chambery

Thus, I realized that those restaurants conformed a chain and that’s why there a more than one of them for those restaurants. Something particular interesting is that all the top restaurants chains are French. The restaurant chain with the highest amount of restaurants is Leon de Bruxelles.

For Average Rating

Through this plot it is seen the European restaurants in those 31 different countries on TripAdvisor, have a high rating, approximately between 4 to 4.8. This could suggest that the average quality offered in European restaurants is really good. This would be deeper study in the multivariate section.

For the Open Days Per Week

In this plot it is possible to see that most of the restaurants are open during the seven days of the week. That is followed by six and five days per week. That makes sense since restaurants should generally be open for five days or more in order to make profit.

However, there are some restaurants that are open for 4 days or less, which is atypical to see. The impact of this low openings amount would be explored in the multivariate section.

For Country

This plot displays the number of restaurants per country. France has the highest number of restaurants in this dataset, which could potentially explain why the top 10 restaurants chain are French. Croatia and Finland are the countries with the least number of restaurants on TripAdvisor. France will be explore deeper in a later section.

For Average Price

This histogram is right-skewed, with a mode around 20 to 30 euros. This could indicate that the majority of European restaurants that appear on TripAdvisor are affordable and generally do not exceed 50 euros. However, there are a some exceptions, which are seen through the outlines in the boxplot with prices ranging from 100 euros to 500 euros.

For Price Level

It is evident in this bar plot that most of the restaurants are mid-range, aligning with what was observed in the average price plot above. This reinforces the idea that the food offered in the majority of the restaurants in this dataset is affordable and potentially budget-friendly.

For special Diets

This plot shows that most of the restaurants on TripAdvisor do not offer special diets in their menus. However, there is some presence of vegetarian options. There is also a possibility that, for some restaurants, it was unknown, so it was registered as if they do not offer special diets. Thus, the impact of special diets can be inaccurate, not meaningful for this EDA.

For Cuisines

In this plot it is seen that most of the restaurants, more than 10000 restaurants, offer a European cuisine. This make sense, since the restaurants I am exploring are located in different European cities.

There is a moderate presence of restaurants, around 1875 ones, that work as bars too. Asian cuisine is also offered by around 1250 restaurants. African and North American cuisines have a lower presence in the menus of the European restaurants. Fusion and South American cuisine are barely offer in those restaurants. Oceania cuisine has the lowest presence in the restaurants within these database.

Multivariate Analysis

Location of the restaurants

Through this latitude vs longitude plot, it is appreciated that most restaurants are located in France. This reinforces the univariate analysis that indicated France having the highest amount of restaurants in the dataset.

Food Top and Bottom Ratings

Since there are a lot observations, the plot will be complicated to read. Thus, to make the analysis more comprehensive, I decided to narrow the observations studied. Since most of the restaurants offer a cuisine in Europe, I decided to explore those restaurants to make the EDA more meaningful.

This filtered dataset will be used to explore the food, service and value ratings in this section.

In this plot it is seen that the top 20 restaurants posses a a food rating of 5 out of 5. This means that the quality of the European cuisine is not only affordable, which was drawn from out previous section analysis, but also really tasteful.

Another interesting finding is that these restaurants with the top food ratings are French, which links with the overall trend of the high performance and presence of restaurants in France.

In this plot, it is appreciated that the most of the restaurants at the bottom, posses a low food rate of 2.0. There is a slightly higher food rate of 2.5 from a restaurant from the Flunch chain. The lowest food rate is 1.5 from Don & Donna.

restaurant_name country food_rating avg_price price_level
Don & Donna Greece 1.5 57.5 expensive
Flunch France 2.0 15.0 cheap
Flunch France 2.0 15.5 cheap
Flunch France 2.5 15.5 cheap

It is interesting that the Don & Donna restaurant located in Greece, despite a low food rating, their price level is still mark as expensive. While, the French restaurant chain like Flunch, with food rating between 2.5 and 5, their price level is usually cheap, with a price around 15 euros.

Service Top and Bottom Ratings

This plot shows that the restaurants that has the highest service rating 5 out of 5 are French.

This plot shows the restaurants with the lowest service rating. The most common lowest rating is 2.5, followed by 2.0. The lowest service rating belongs to Don & Donna, which also has the lowest food rating as seen previously.

The Flunch chain restaurant appears again, meaning that they do not only have the a low food rating, but also a low service rating.

Value Top and Bottom Ratings

Through this plot, it is seen that the restaurants that has the highest value rating, 5 out of 5, are French.

This plot shows that the restaurants with the lowest value rating is mostly 2. Something particular from this plot is that the Don & Donna restaurant appears again at the bottom.

Price and Rating Relation

In this plot it is evident the relationship between the average rating and price are not directly proportional. This is because not because the restaurant is expensive, it has a high rating. For example, a restaurants with menu within the price range of 390 euros have a rating of 4.5, while another restaurant with a menu around 50 euros have a higher rating of 5. Thus, it is inferred that other factors, such as experience, quality and not only the price matter to the consumers when rating restaurants.

This logic is also reinforced when looking at the most expensive restaurants. Particularly, checking the Brasserie Og Restaurant NO76, a Denmark restaurant, is the most expensive restaurant in this dataset, yet has an average rating of 4.5. The average rating does not reflect a poor image of the restaurant, but it could contribute to why the restaurants is more expensive than others.

Nevertheless, when checking Au Bon Accueil, a French restaurant with an average rating of below 3.75 and average price around 275 euros. It is questionable how a restaurant can charge such a substantial price in the presence of a less than meritorious rating.

Hence, it can be concluded that while price exerts influence on the restaurant’s average rating, and viceversa, there exist additional factors—namely, the ambience, food quality, and service—that significantly shape the overall experience for each patron, thereby influencing the performance of the restaurant at large.

Exploring particular restaurants

Don & Donna

As explored before, Don & Donna restaurant has appeared at the bottom in the food, service and value rating:

restaurant_name food_rating service_rating value_rating avg_price price_level
Don & Donna 1.5 1.5 1 57.5 expensive

Thus it is possible to infer that the restaurant, Don & Donna is the worst one in this dataset base on the food, service and value rating. Still, it is interesting to see that even though their rating is bad, their prices are still expensive around 50 euros. This lead to think that maybe Greek restaurants are usually expensive regardless of their rating.

The Flush

The Flush, the French chain, has appeared with the highest amount of restaurants, yet it posses the lowest food and service rating as seen previously in the EDA.

restaurant_name city food_rating service_rating value_rating avg_rating avg_price price_level
Flunch Mers-les-Bains 3.5 4.0 4.0 3.5 15.5 cheap
Flunch Poitiers 3.5 3.5 4.0 3.0 15.5 cheap
Flunch Herouville-Saint-Clair 3.5 3.5 4.0 3.5 15.5 cheap
Flunch Strasbourg 3.5 3.5 3.5 3.0 16.5 cheap
Flunch Clermont-Ferrand 3.5 3.5 3.5 3.0 15.5 cheap
Flunch Charleville-Mezieres 3.5 3.5 3.5 3.0 15.5 cheap
Flunch Nantes 3.0 3.5 3.5 3.0 15.5 mid-range
Flunch Roanne City 3.0 3.5 3.5 3.0 15.5 mid-range
Flunch Moulins 3.0 3.5 3.5 3.0 15.5 cheap
Flunch Manosque 3.0 3.5 3.5 3.0 15.5 cheap
Flunch Pertuis 3.0 3.5 3.5 3.0 15.5 mid-range
Flunch Villebon-sur-Yvette 3.0 3.0 3.5 2.5 15.5 cheap
Flunch Antibes 3.0 3.0 3.5 2.5 15.5 cheap
Flunch Macon 3.0 3.0 3.5 3.0 15.5 cheap
Flunch Boulogne-sur-Mer 3.0 3.0 3.5 3.0 15.5 cheap
Flunch Cholet 3.0 3.0 3.5 3.0 15.5 mid-range
Flunch Amiens 3.0 3.0 3.5 2.5 15.5 cheap
Flunch Chambery 3.0 3.0 3.5 2.5 15.5 cheap
Flunch Besancon 3.0 3.0 3.0 3.0 15.5 cheap
Flunch Noyelles-Godault 3.0 3.0 3.0 3.0 15.5 cheap
Flunch Avignon 2.5 3.0 3.5 2.5 15.5 cheap
Flunch Franconville 2.5 3.0 3.0 2.5 15.5 cheap
Flunch Le Quesnoy 2.5 3.0 3.0 2.5 15.5 cheap
Flunch Tours 2.5 3.0 3.0 3.0 15.5 cheap
Flunch Thionville 2.5 3.0 3.0 2.5 15.5 cheap
Flunch Bordeaux 2.5 2.5 3.0 2.5 15.5 cheap
Flunch Pau 2.5 2.5 3.0 2.5 17.0 cheap
Flunch Vitrolles 2.5 2.5 3.0 2.5 15.5 cheap
Flunch Le Pontet 2.5 2.5 3.0 2.5 15.5 cheap
Flunch Bonneuil-sur-Marne 2.5 2.5 3.0 2.5 15.5 cheap
Flunch Epagny 2.5 2.5 2.5 2.0 15.5 cheap
Flunch Saint-Jean-de-la-Ruelle 2.0 2.5 3.0 2.5 15.5 cheap
Flunch Montbeliard 2.0 2.0 2.5 2.0 15.0 cheap

Moreover, when looking at French restaurants with the lowest ratings in food, service and value.

restaurant_name food_rating service_rating value_rating avg_rating avg_price price_level
Les Chandelles 2.0 2.0 2.0 2.0 18.0 mid-range
La Confiance 2.0 2.0 2.0 2.0 27.0 mid-range
Flunch 2.0 2.0 2.5 2.0 15.0 cheap
Brasserie de l'Evéché 2.0 2.5 2.0 2.0 17.5 mid-range
Le Saint Clair 2.0 2.5 2.0 2.5 18.5 mid-range
Les Comptoirs Casino 2.0 2.5 2.0 2.0 12.5 cheap
Flunch 2.0 2.5 3.0 2.5 15.5 cheap
Mecenate 2.5 2.0 2.5 2.5 22.5 mid-range
A La Maree 2.5 2.5 2.0 2.0 17.0 mid-range
Del Arte Chartres 2.5 2.5 2.5 2.5 21.0 mid-range
Le Molière 2.5 2.5 2.5 2.0 30.0 mid-range
A la maree 2.5 2.5 2.5 2.5 31.0 mid-range
L'Exocet 2.5 2.5 2.5 2.5 29.0 mid-range
Brasserie Les Platanes 2.5 2.5 2.5 2.5 13.5 mid-range
Restaurant Del Arte Annecy 2.5 2.5 2.5 2.5 21.5 mid-range

It is clear that the Flunch chain restaurant is not the worst restaurant in France, since that title goes to Les Chandelles and La Confiance.

Nevertheless, considering other French restaurants around the same price average an price level:

restaurant_name food_rating service_rating value_rating avg_rating avg_price price_level
Bar a Huitres 4.5 4.5 5.0 4.5 15.5 cheap
Le fromage rit 4.5 4.5 4.5 4.5 15.5 mid-range
La P'tite Franquette 4.5 4.5 4.5 4.5 15.5 mid-range
La Table de Charbon-Blanc 4.5 4.5 4.5 4.0 15.5 mid-range
le O2 verdun 4.5 4.5 4.5 4.5 15.5 mid-range
La Cocotte des Halles 4.5 4.5 4.5 4.5 15.5 mid-range
Restaurant Plus Belle La Vie 4.5 4.5 4.5 4.5 15.5 mid-range
La Table du Malvan 4.5 4.5 4.5 4.5 15.5 mid-range
L'Eau Vive 4.5 4.5 4.5 4.5 15.5 mid-range
O Patio 4.5 4.5 4.5 4.5 15.5 mid-range
Le Goëlic 4.5 4.5 4.5 4.5 15.5 mid-range
O Ptit Paradis 4.5 4.5 4.5 4.5 15.5 mid-range
Le Tablier 4.5 4.5 4.5 4.5 15.5 mid-range
Buron des Bouals 4.5 4.5 4.5 4.5 15.5 mid-range
Le Chancel 4.5 4.5 4.5 4.5 15.5 mid-range

In the table above, it is seen that costumers can find places where they get good food without spending more, and still get great service. Restaurants like La Table de Charbon-Blanc and Restaurant Plus Belle La Vie prove that French restaurants can be easy on the wallet while giving you a great experience.

Exploring France a bit deeper

Along this EDA, France had a strong presence since most of the restaurants are located there. Also, when looking at prices and ratings the top and bottom restaurants, French restaurants appeared. Thus, I decided to particularly explore at restaurants located in France.

Price level and Ratings

restaurant_name avg_rating price_level
L'Auberge de la Brie 5 expensive
The Oystercatcher 5 cheap
Creperie Ty Gwechall 5 mid-range
La Maison 5 expensive
Le Clos de la Prairie 5 expensive
L'Orée de la Forêt 5 expensive
Restaurant de La Gare de Percy 5 mid-range
Bon thé Bonheur 5 expensive
Le Savignois 5 mid-range
Restaurant BK 5 mid-range
Les Chars A Bancs 5 mid-range
La Cote d'Armor 5 mid-range
La Goguette 5 expensive
Don Ulpiano 5 mid-range
Tea & Ty 5 mid-range
Copain Copine 5 mid-range
Restaurant Dolce Vita 5 expensive
Les Antiquaires 5 mid-range
AU CREMIER GOURMAND 5 mid-range
Pause Saveurs 5 cheap
Le 36 Bonap 5 cheap
Dolce Italia 5 cheap
La Terrasse Thalassoleil 5 mid-range
Influences Sud Ouest 5 mid-range
LETREIZH Comptoir breton 5 cheap
Le Neuvieme Art 5 expensive
Génépi HOTEL 5 mid-range
Auberge du Grand Megnos 5 mid-range
Nature Gourmande 5 expensive
Un p'ti temps K 5 cheap
Mariottat 5 expensive
Le P'tit Roseau 5 expensive
Maison Lameloise 5 expensive
Aromatique Restaurant 5 expensive
Bartavelle 5 expensive
La Trencadis 5 expensive
La Croissanterie du Lac 5 cheap
Cote Table 5 mid-range
Auberge Chez Guth 5 expensive
Au Faitout 5 expensive
Restaurant au boeuf rouge 5 expensive
Auberge de la Tourre 5 mid-range
Chez Cécile 5 mid-range
Restaurant Le Very'table 5 mid-range
Famille Moutier 5 mid-range
A la Table de Chanelle 5 expensive
Le Manege Des Saisons 5 mid-range
Le Pelican 5 expensive
Ti Blazenn 5 expensive
Le Mouton Noir 5 mid-range
Le Vicomte 5 mid-range
Le Jour Du Poisson 5 expensive
Auberge des 4 Chemins 5 mid-range
CoquiThau 5 cheap
le trou normand 5 expensive
ARGI-EDER 5 expensive
Chez Marie 5 mid-range
Au 14 Fevrier 5 expensive
Les Sources du Moulin 5 mid-range
Ti Henri 5 mid-range
Le Clocher des Pères 5 expensive

Through this bar plot, it is evident that the majority of French restaurants fall into the mid-range category. This suggests that French restaurants offer a variety of services and cuisines that are affordable for consumers. Moreover, their service remains excellent, as indicated in the table, which shows that some top-rated restaurants with an average rating of 5 also belong to the mid-range category.

Additionally, the plot suggests that French restaurants cater to diverse customer budgets. There are upscale, expensive restaurants for special occasions, as well as mid-range and affordable options for casual or informal gatherings. Regardless of the price level customers are seeking, they can still find restaurants with great food, value, and service, as illustrated in the table listing restaurants from cheap to expensive, all with an average rating of 5.

How many french restaurants have an award?

In this pie chart, it is evident that the majority of French restaurants have received awards. This underscores the excellent culinary service that French restaurants offer. These accolades not only enhance their reputation but also contribute to the higher costs associated with some French restaurants. Winning awards can impact not only the customer experience but also the pricing of their menus.

Conclusion